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We study estimation of a multivariate function / : R"* R when 
the observations are available from function Af, where A is a known 
linear operator. Both the Gaussian white noise model and density 
estimation are studied. We define an L2 empirical risk functional, 
which is used to define an S-net minimizer and a dense empirical risk 
minimizer. Upper bounds for the mean integrated squared error of 
the estimators are given. The upper bounds show how the difficulty 
of the estimation depends on the operator through the norm of the 
adjoint of the inverse of the operator, and on the underlying function 
class through the entropy of the class. Corresponding lower bounds 
are also derived. As examples we consider convolution operators and 
the Radon transform. In these examples the estimators achieve the 
optimal rates of convergence. Furthermore, a new type of oracle in- 
equality is given for inverse problems in additive models. 

1. Introduction. We consider estimation of a function / : R^ R, 
when a linear transform Af of the function is observed under stochastic 
noise. We consider both the Gaussian white noise model and density esti- 
mation with i.i.d. observations. We study two estimators: a 6-net estimator 
which minimizes the L2 empirical risk over a minimal 6-net of a function 
class, and a dense empirical risk minimizer which minimizes the empirical 
risk over the whole function class without restricting the minimization over 
a 5- net. We call this estimator "dense minimizer" because it is defined as a 
minimizer over a possibly uncountable function class. The 5-net estimator 
is more universal: it may be applied also for unsmooth functions and for 
severely ill-posed operators. On the other hand, the dense empirical min- 
imizer is expected to work only for relatively smooth cases (the entropy 
integral has to converge). But because the minimization in the calculation 
of this estimator is not restricted to a 6-net we have available a larger tool- 
box of algorithms for finding (an approximation of) the minimizer of the 
empirical risk. 

Let (Y,y,iy) be a Borel space and let A : L2(R'^) L2(Y) be a linear 
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operator, where L2(R.'^) is the space of square integrable functions / : R*^ — ^ 
R (with respect to the Lebesgue measure) , and L2 (Y) is the space of square 
integrable functions (7 : Y ^ R (with respect to measure i^). In the density 
estimation model we have i.i.d. observations 

(1) yi,...,y„GY, 

with common density function Af : Y — R, where / : R*^ — R is a density 
function which we want to estimate. In the Gaussian white noise model the 
observation is a realization of the process 

(2) dY^iy) = {Af){y) dy + n-V^ dW{y), yeY, 

where W{y) is the Brownian process on Y, that is, for /ii,/i2 £ -^2(Y), 
the random vector {J^ hidW, h2dW) is a 2-dimensional Gaussian ran- 
dom vector with mean, marginal variances H^iHi ll^slli i/) covari- 
ance /y /ii/i2 t?'^- (In our examples Y is either the Euclidean space or the 
product of the real line with the unit sphere, so that the existence of the 
Brownian process is guaranteed.) We want to estimate the signal function 
/ : R*^ — s- R. The Gaussian white noise model is very useful in presenting 
the basic mathematical ideas in a transparent way. For the 5-nct estimator 
the treatment is almost identical for the Gaussian white noise model and 
for the density estimation, but when we consider the dense empirical risk 
minimization, then in the density estimation model we need to use brack- 
eting numbers and empirical entropies with bracketing, instead of the usual 
L2 entropies. Our results for the Gaussian white noise model can also serve 
as first step for getting analogous results for inverse problems in regression 
or in other statistical models. 

The L2 empirical risk is defined by 

, , / \ _ J ~2 Jy^Qq) dYn + 115112 ) Gaussian white noise, 

( J M9) - I _2^_i ^n^^(Q^)(y.) + ||£,||2 ^ density estimation, 

where Q is the adjoint of the inverse of A: 

(4) / {A~'h)g= [ h{Qg)du, 

for h G -Z^2(Y), g € L2(R'^). The operator Q = {A~^)* has the domain 
L2(R'^), similarly as A. Minimizing ||/ — /Hi with respect to estimators / 
is equivalent to minimizing ||/ — /Hi — H/Hi) and we have, in the Gaussian 
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white noise model, 



11/ -/II 



-2/ // + 



2 / {Af)iQf)di.+ \\f\\l 



-2 J^{Qf)dYn + \\f\\i 



(5) 



7i 



. (/) . 



The usual least squares estimator is defined as a minimizer of the the crite- 
rion 



\\Af - Aff^ - \\Aff^ 



(6) 



def 



-2 j^iAg)dYr, + \\Agf^ 



See for example lO'Sullivan In density estimation the log-likelihood 

empirical risk has been more common than the L2 empirical risk, and in 
the setting of inverse problems the log-likelihood is defined as Jn{g) = 
— X]r=i logics') (^i)) analogously to ([6]). These alternative definitions of 
the empirical risk do not seem to lead to such elegant theory as the em- 
pirical risk in ([3]). The empirical risk i n (|3l) has been used in deconvolution 
problems for projection estimators by Comte et al. ( 20051 ). 

We give upper bounds for the mean integrated squared error (MISE) 
of the estimators. The upper bounds characterize how the rates of con- 
vergence depend on the entropy of the underlying function class J- and 
on smoothness properties of the operator A. Previously such characteri- 
zations have been given (up to our knowledge) in inverse problems only 
for the case of estimating real valued linear functionals L. In these cases 
the rates of convergence are determined by the modulus of continuity of 
the fu nctional u){e) = sup{L(/) : / G .F, ||^/||2 < e}, see iDonoho Sz Low 
(|l992l ). For the case of estimating the whole function with a global loss 



function the rates of convergence depend on the largeness of the underly - 
ing func ti on cl as s in terms of the entropy a.nd ca p acity, seelCencovl (119721) . 
Le Cainl (119731). Ilbraeimov &: Hasminskiil (Il980l). Ilbraeimov &: Hasminskii 



(ll98lh.rBirgel[l983l ).lHa sminskii k Ibradmovl (ll99d). lBarron Yangl(ll999h 
Ibragimov ( 20041 ) . (5- net estimators were considered e.g. by van der Laan et al 
(j2004l ). These papers consider direct statistical problems. We show that for 
inverse statistical problems the rate of convergence depends on the operator 
trough the operator norm g{Q,J^s) of Q, over a minimal 5-net !Fs, see ^ 
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for the definition of q{Q,J-s)- More precisely, the convergence rate ipn of the 
(5-net estimator is the solution to the equation 



where is the cardinality of a minimal (5-net. For direct problems, when 

A is the identity operator, g{Q,Ts) x 1. As examples of operators A we 
consider the convolution operator and the Radon transform. For these oper- 
ators the estimators achieve the minimax rates of convergence over Sobolev 
classes. 

The general framework for empirical risk minimization and the use of the 
empirical process machinery including entropy bounds for deriving optimal 
bounds seems to be new. Convolution and Radon transforms are discussed 
for illustrative purposes. These examples show that our results lead to opti- 
mal rates of convergence. As a new application we introduce the estimation 
of additive models in inverse problems. A new type of oracle inequality is 
presented, which gives the optimal rates of convergence also in "anisotropic" 
inverse problems. 

Contents. Section [2] gives an upper bound for the MISE of the (5-net estima- 
tor. Section [3] gives a lower bound for the MISE of any estimator. Section |4] 
gives an upper bound for the MISE of the dense empirical risk minimizer. 
Section [5] finds the adjoint of the inverse of A, when A is a convolution op- 
erator or the Radon transform. Section [6] proves that the 5-net estimator 
achieves the optimal rate of convergence in the ellipsoidal framework and 
it contains an oracle inequality for additive models. Section [7] contains the 
proofs of the main results. The appendix contains calculations related to 
ellipsoids. 

Notation. We use the notation || • || to mean the Euclidean norm in R"^. 
The L2 norm of a function g : R"' R will be denoted by \\g\\2- The unit 
sphere in R*^ is denoted by S^-i = {x E R*^ : ||x|| = 1}. The Lebesgue mea- 
sure on Sd-i is denoted by We will make use of the formula fi{Sd-i) = 
27r'^/2/r((i/2). By Iji we denote the indicator function, i.e. Ir{x) = 1 when 
X €z R and Ir{x) = otherwise. We write a„ x 6„ to mean that < 
liminf„^ooan/&n < limsup„^oo On/^ 

n < 00, and Qn '!p= bn means that 
liminfn^oo On/^n > 0. The Fourier transform of a function g G Li(R'^) 
is defined by 
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where i is the imaginary unit. We use also the notation Fig when : R — > R 
is univariate. We have 

g{x) = (27r)-'^ / exp{-ix'^uj}{Fg){uj)dLO, x £ R'^. 
By Parseval's theorem, we have for f,g(z Li(R'^) n L2(R'^), 



f fg= (27.)-' f (Ff){Fg). 



Convolution of / and g is denoted hyf*g{x) = J-^d f{x — y)g{y) dy. We have 
that 

(7) F{f*g) = iFf)iFg). 

The probability measures of the Gaussian white noise process Yn and of the 
i.i.d. sequence (Yi, . . . , Yn) are denoted by P^f ■ 

2. d-net minimizer. 

Definition of the estimator. Let T he a set of densities or signal functions 
/ : R*^ —I- R. Let J-s be a finite 6-net of J- in the L2 metric, where 5 > 0. 
That is, for each f G T there is a ^ E such that ||/ — 0||2 < S. Define the 
estimator / by 

/ = argmin^g^^7„ ((/>), 

where 7n(</') is defined in ([3]). Typically we would like to choose a 5-net of 
minimal cardinality. We assume that is bounded in the L2 metric, 

(8) sup ||5r||2 < B2, 

gar 

where < ^2 < 00. 

An upper hound to MISE. Theorem [1] gives a bound for the mean integrated 
squared error of the estimate. We may identify the first term in the bound 
as a bias term and the second term as a variance term. The variance term 
depends on the operator norm of Q over the (5-net J- 5. We define this operator 
norm as 

(9) ,{Q,rs)= max ^^^^ , ^ > 0, 

where Q is defined by (|3|). In the case of density estimation we need the 
additional assumption that q{Q,J-^s) ^ 1 find that AJ^ and QJ-' are bounded 
in the L^o metric: 

(10) g{Q,Ts)>l, supP/lloo <Soo, sup < 5;,, 

feJ^ far 
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where < Boo,B'^ < oo. 



Theorem 1. For the density estimation we assume that \10\) is satisfied. 
We have that for f £ J^, 



E 



< Ci5^ + C2 



n 



where 

(11) 

(12) 

(13) 



Ci = (1-20-^(1 + 20, 

C2 = (1 - 2e)-^ea, 

Cr > 0, 



and is such that 

(14) 

f + v/2 [8(S^)V9 + C^Sooj) < ^ < 1/2, density estimation 

[ y^2/Cr < ^ < 1/2, white noise. 



A proof of Theorem [T] is given in Section 17.21 

Remark 1. Theorem [1] shows that the J-net estimator achieves the rate 
of convergence ipn, when ipn is the solution of the equation 

(15) ^ n~'g\Q,J^^Jlogii^J^^J. 

We calculate the rate under the assumptions that log{i^J^g) and g{Q,J^5) 
increase polynomially as 6 decreases: we assume that one can find a 5-net 
whose cardinality satisfies 

log{#J^s) = CS-" 
for some constants b,C > and we assume that 

giQ,J^s) = C'6-^ 

for some a,C' > (in the direct case a = and C = 1). Then (|15p can be 
written as ip"^ x n'~^ijj^'^"'~^ and the rate of the 6-net estimator is 

,-l/[2(a+l)+fe] 



(16) 



n 



Let be a set of s-smooth ti-dimensional functions, so that b = d/s. Then 
the rate is 



n 



-s/[2{a+l)s+a 



which gives for the direct case a = the classical rate ilin^ n ^/i'^^+d) _ 
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3. A lower bound for MISE. Theorem [2] gives a lower bound for the 
mean integrated squared error of any estimator, when estimating densities 
or signal functions / : R'^ — > R in the function class T . Theorem [2] holds 
also for nonlinear operators. 

Theorem 2. Let A be a possibly nonlinear operator. Assume that for 
each sufficiently small 6 > we find a finite set T)^ <Z T for which 

(17) mm{\\f -gh: f,geVs, f ^ g} > CoS 

and 



(18) 



max{||/ — g\\2 ■ f,g 'C's} < Ci6, white noise, 
max{DK{f,g) '■ f,g & T^s} ^ Ci^, density estimation, 



where Dj^{f,g) = J log^{f / g) f is the Kullback-Leibler distance, and Cq, Ci 
are positive constants. Denote 

qk{A Vs) = [ a "tz-ll"' ' '^^'^'^ 

1 ma.Xf^gfzX)gj^g ^y'^g'^^^^ , density estimation. 

Let ipn be such that 

(19) loge(#^?^J > ni^l QKiA^i^n), 

where a„ )^ 6„ m,eans that liminf„^oo o-n/^n > 0. Assume that 

(20) lim^ni>l0l{A,V^J = oo. 

Then, 

liminfV-'infsupi?||/-/||i >0, 

where the infimum is taken over all estimators. That is, ipn is a lower bound 
for the minimax rate of convergence. 

A proof of Theorem [2] is given in Section 17.31 

Remark 2. Theorem [2] shows that one can get a lower bound ^„ for 
the rate of converge by solving the equation 

(21) V'n qI {A, ) >c log, (#P^„ ) . 

The upper bound in Theorem [T] depends on the operator norm of Q, defined 
in Q, whereas the lower bound depends on the operator norm of A. Note 
also that the operator norm q{Q, J-^^ ) is on the different side of the equation 
in (fT5]) than the operator norm qk{A,V^^) in the equation ([2T]) . 
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Remark 3. In the density estimation case one can easily check assump- 
tions (jlSp and (j20p if one assumes that the functions in AD^ are bounded 
and bounded away from 0. Then, 

(22) C ■ \\Aif - g)h < DK{Af,Ag) < C ■ \\A{f - g)h. 

and (jlSp and (I20p follow by the corresponding conditions with Hilbert norms 
instead of Kullback-Leibler distances. 

4. Dense minimizer. The dense minimizer minimizes the empirical 
risk over the whole function class J-. In contrast to the (5-net estimator 
the minimization is not restricted to a (5-net. We call this estimator "dense 
minimizer" because it is defined as a minimizer over a possibly uncountable 
function class. The (5-net estimator is more widely applicable: it may be 
applied also to estimate unsmooth functions and it may be applied when 
the operator is severely ill-posed. The dense minimizer may be applied only 
for relatively smooth cases (the entropy integral has to converge). Because 
it works without a restriction to a (5-net we have available a larger toolbox 
of numerical algorithms that can be applied. 

Definition of the estimator. Let ^ be a collection of functions / : R'^ R, 
which are bounded in the L2 metric as in ([8]), and let the estimator / be a 
minimizer of the empirical risk over up to e > 0: 

7n(/) < infggjF7„(5) + e, 

where ^n{4') is defined in ([3]). For clarity, we present separate theorems for 
the Gaussian white noise model and for the density estimation model. 

4.1. Gaussian white noise. 

An upper hound to MISE. Let .7-^, (5 > 0, be a (5-net of with respect to 
the L2 norm. Define 

(23) £>(Q,.F5)=max(i^^^^^4^ :/G.F5,(7G^25,//4' ^ > 

I ll/-5'l|2 J 

where Q is the adjoint of the inverse of A, defined by dH). Define the entropy 
integral 

(24) G{5) =^ Q{Q,J'u)^\ogM^u)du, 5 G (0, B2], 
where B2 is the L2 bound defined by ([8]). 
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Theorem 3. Assume that 

1. the entropy integral in converges, 

2. G{6)/6'^ is decreasing on the interval (0,-62]; 

3. g{Q,J^s) = c6~"-, where < a < 1 and c> 0, 

4. limg_^o G{6)6'' '^ = oc, 

5. 5 q{Q , J^s) \AogJ^^Ts) is decreasing on (0, i?2]- 
Let ijjn be such that 

(25) V'n > Cn-i/2G(V.J, 

where C is a positive constant, and assume that liuin^oo nipn^^^"^^ = 00. 
Then, for f £ J^, 



f-f\l<C' (i^l + e), 



E 

for a positive constant C , for sufficiently large n. 
A proof of Theorem [3] is given in Section 17.41 

Remark 4. Assumption 5 is a technical assumption which is used to 
replace a Riemann sum by an entropy integral. We prefer to write the as- 
sumptions in terms of the entropy integral in order to make them more 
readable. 

Remark 5. We may write g{Q,J-s) in a simpler way when there exists 
minimal (5-nets J^s which are nested: 

Then we may define alternatively 

^^ IIQ(/-g)ll2 
q{Q,Ts) = max —- . 

f,9<^^s,fi^g 11/ - 5112 

Remark 6. Theorem [3] and Theorem H] show that the rate of conver- 
gence of the dense minimizer is the solution of the equation 

(26) Vn=^"'/'G(V'n). 

To get the optimal rate the net J-^ is chosen so that its cardinality is minimal. 
In the polynomial case one can find a (5-net whose cardinality satisfies 



imsart-aos ver. 2007/12/10 file: invemp-tech.tex date: April 20, 2009 



10 



for some constants b,C > and the operator norm satisfies 

giQ,^s) = C'5-'' 

for some a, C > 0. (In the direct case a = and C = 1.) Thus the entropy 
integral G{5) is finite when /q u~°'~'^^'^ du < oo, which holds when 

(27) a + 6/2<l. 

Then (126]) leads to i/^^ ^ n^^^'^ipn'^ b/2+i ^^^^ dense mini- 

mization estimator is 

(28) V'n>in-i/P{a+i)+fe]_ 

This is the same rate as the rate of the (5- net estimator given in (jl6p . We have 
the following example. Let .7-" be a set of s-smooth d-dimensional functions, 
so that b = d/s. Then condition ()27p may be written as a condition for the 
smoothness index s: 

d 

2(l-a) ■ 

When the problem is direct, then a = 0, and we have the classical condition 
s > d/2. The rate is ipn ^ ji-s/l2(a+i)s+d] ^ which gives for the direct case 
a = the classical rate ipn 7j,-s/(2s+d)_ 

4.2. Density estimation. Let us call a 5-bracketing net of with respect 
to the L2 norm a set of pairs of functions J^s = {{gf^gf) '■ j = ^, ■ ■ ■ ,Ns} 
such that 

L \\gf-gfh<5,j = l,...,Ns, 

2. for each g ^ T there is j = j{g) G {1, . . . , Ns} such that g^ < g < g^ ■ 

Let us denote = {^^ : j = 1, . . . , Ns] and J'^ = {^^^ : j = 1, . . . , Ns}. 
Define 

(29) Qden{Q,J'&) = max{f?(Q,.Ff ),£)(Q,.Fh-^2^5)} , 
where 

,(g,^^^f ) = max/ 'iyf -^'/"^ : / G G A 

y \w - g h J 

and 
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for d > 0. Define the entropy integral 
rS 



(30) G{6) =^ ^ Qden{Q,:Fu)^logM^u)du, 6 G (0,^2], 
where B2 = sup j-^yr \ 



Theorem 4. We make the Assumptions 1-5 of Theorem\^(with operator 
norm QdeniQiJ^s) in place of q{Q,Ts)), and in addition we assume that 
supjgjp ll^/lloo < 00, supg^jrL ijjru HQ^Hoo < OO, and that the operator Q 
preserves positivity (g > implies that Qg >0). Let ipn be such that 

(31) ^Pl>Cn~'/^G{^Pn), 

for a positive constant C, and assume that linin^ao nipn^^'^"'^ = 00. Then, 
for feT, 



E 



for a positive constant C , for sufficiently large n. 

A proof of Theorem S] is given in Section 17. 5[ An analogous discussion 
of optimal rates as in Remark [6] for the Gaussian white noise model also 
applies for dense density estimators. 

5. Examples of operators. As examples for operators we consider 
convolution operators and the Radon transform. The definition of the em- 
pirical risk involves the adjoint of the inverse of the operator A, and we 
calculate the adjoint of the inverse of A, when A is a convolution operator 
or the Radon transform. 

5.1. Convolution. The convolution operator A is defined by 

Af = a*f, / : ^ R, 

where a : R"^ ^ R is a known integrable function. The adjoint of the inverse 
oi A is Q, defined for 5 : R*^ ^ R, by 

(32) «» = ^-'(^ 

where F denotes the Fourier transform. To derive this equation note that, 
for h-.R.'^ ^K, 

Fa 
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Thus, for h : R'^ H, g : R*^ — > R, applying two times Parseval's theorem 
give 

jRd jRd i-a J-Rd 

Convolution operators appear in density estimation when the observations 
contain additional measurement errors. In the errors-in-variables model we 
observe Yi = Xi + Ci, i = I, . . . ,n, where Xi f , f : R'^ — > R is the unknown 
density which we want to estimate, and ej ~ a are the measurement errors. 
The density of the observations Yi is Af = a* f. 

5.2. Radon transform. The Radon transf o rm h as be en discuss e d in a 
series of papers and books including Deanj ( 19831 ) and Natterer ( 2001 ). 
The Radon transform is defined as the integral of a d-dimensional function 
over d— 1-dimensional hyperplanes. We parameterize the d— 1-dimensional 
hyperplanes in the d-dimensional Euclidean space with the help of a direction 
vector ^ G Sd-i and a distance from the origin u £ [0, oo): 

(33) P^,u = {ze-R'^ ■.z^^ = u}, ^G Srf_i,uG [0,oo). 
Define the Radon transform for / : R*^ — > R as 

{Am,u)= f /, eGSrf_i, UG [0,cx)), 

where the integration is with respect to the d — 1-dimensional Lebesgue 
measure. We will take the Radon transform as a mapping from functions / : 
R"^ ^ R to functions Af : Y ^ R, where Y = Sd-i x [0, oo), and the mea- 
sure 1/ of the Borel space (Y,3^, u) is taken to be di'{^, u) = u'^~^ dud^{^). 
The adjoint of the inverse of ^ is Q, defined for g : R"^ — > R, by 

(34) {Qg){^,u) = i27Tf-' ■ {F^%g){u), ^ G Srf_i, n G [0, oo), 
where 

{I^g){t) = {Fg){tO, e G S^-i, t G [0, oo). 
To see this note first that, for h : S^-i x [0, oo) R, we have that 

(35) {FA-'h){u;) = {n^/^\^^\h){\\u;\\), w G R^ 
where TC^ is the Fourier transform of h{S,, ■ ) for fixed ^ G S^-i: 

ni:h = Fi{h{^, ■)), eGSrf_i. 



Equat ion (135p follows directly from the projection theorem, see iNatterer 
(l200lh . 
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Two applications of Parseval's theorem and (I35p give for h : Sd^i x 
[0, oo) ^ R, 5 : R"' ^ R, that 

= {27rf / t^-\n^h){t){Fg){tOdtdf,{0 

r poo 

= (27r)'^-W / u''-^h{i,u){F{^I^g){u)dudii{i) 
JSa-i Jo 

= j^KQg). 

This shows (IM]) . 

2D Radon transform. In the 2D case we consider reconstructing a 2-dimen- 
sional function from observations of its integrals over lines. Let D = {x G 
R^ : ||x|| < 1} be the unit disk in R^. The plane in (I33p can be written as 
P^^u = {u^ + ^C"*" '■ t G R}) where ^-^ is a vector which is orthogonal to ^. We 
can write = (cos (;/>, sin 0) and S^'^ = (— sin</>, cos </>). Thus we parameterize 
the lines by the length u G [0, 1] of the perpendicular from the origin to the 
line and by the orientation (p £ [0, 2tt) of this perpendicular. A common way 
to define 2D Radon transform is 

(36) Af(u,(f>) = — ^^=^= / /(ticos0 — t sin sine/) + tcos(/)) dt, 

2V1 - u'^ JVT^ 

where {u, (p) eY = [0, 1] x [0, 27r], and we suppose that / G Li{D) n L2{D). 
Now the Radon transform is vr times the average of / over the line segment 
that intersects D. We consider Rf as the element of L2(Y,i/), where v is 
the measure defined by dv{u,(j)) = 27r~^\/l — v? du dep. 

Tomography. The positron emission tomography is a density estimation 
problem but the X-ray tomography is a regression type problem. In the set- 
ting of positron emission tomography events happen at points Xi, . . . , Xn G 
R"^, and these points are i.i.d. with density /. We do not observe the location 
of the points but only that an event has occurred on a hyperplane contain- 
ing the point. We assume that the hyperplane is uniformly oriented, and 
that the distance of the hyperplane from the origin is given by the Radon 
transform: 

(37) S ~ Unif(S,_i), [/ I 5 = C ~ {Am, •), 

where hyperplanes are written as {z S R*^ : S = [/}. We assume to observe 
i.i.d random variables Yi = {Si,Ui) £ Sd-i x [0,cxd), i = l,...,n, which 
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are distributed as {S,U), This is equivalent to observing the hyperplanes 
{z £R'^ : z^Si = Ui}. We want to estimate the density / : R'^ ^ R in (|371) . 
The density of the observations Yi is equal to 

(38) u) = — ^ {Am, u), e e S^.i, u e [0, oo). 

6. Examples of function spaces. 

6.1. Ellipsoidal function spaces. Since we are in the L2 setting it is nat- 
ural to work in the sequence space; we define the function classes as ellip- 
soids. We shall apply singular value decompositions of the operators and 
wavelet-vaguelette systems in the calculation of the rates of convergence. In 
Section 16.1.11 we calculate the operator norms in the framework of singular 
value decomposition. In Section r6.1.2l we calculate the operator norms in the 
wavelet- vaguelettte framework. Section [6.1.31 derives the rate of convergence 
of the (5-net estimator for the case of a convolution operator and the Radon 
transform, and the lower bound for the rate of convergence of any estimator. 

6.1.1. Singular value decomposition. We assume that the underlying func- 
tion space J-' consists of d-variate functions that are linear combinations of 
orthonormal basis functions 4)j with multi-index j = (ji, . . . ,jd) £ {0, 1, . . .}'^. 
Define the ellipsoid and the corresponding collection of functions by 

{00 I I 00 

ji=0,...,jd=0 J [ji=0,...jd=0 

5-net and 5 -packing set for polynomial ellipsoids. We assume that there 
exists positive constants Ci, C2 such that for all j G {0, 1, . . .}'^ 

(40) Ci-\j\'<a,<C2-\j\', 

where \j\ = ji + • • • + jd- We construct a (5-net 0^ and a J-packing set 0| in 
Appendix |Al Since the construction is in the sequence space we define the 
(5-net and (5-packing set of by 

(41) Ts = \ £ Ojc/^j :9eGs\,Vs = \ £ Oj^j : 6 e . 

[ii=o,...,id=o J [ji=o,...jd=o J 

The set 0^ is such that for 6 £ Qs 

ej = 0, when j ^ {l,...,M}^ 
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where 

(42) M X 5-^/'. 
Set 6J is such that for all 6* G 

(43) Oj = 9* , when j ^ {M* M^, 

where 9* is a fixed sequence with Yl'y\>o ^"j^j"^ = L* < L, 

M* = [M/2]. 

Furthermore, it holds that 

(44) iog(#e5) < log(#e|) > c'6-''/\ 

Operator norms. We calculate the operator norms q{Q,Ts) and QKiAjV^) 
in the ellipsoidal framework, where J^s and Ds are defined in (I4ip and Ap- 
pendix [Al We apply the singular value decomposition of A. We assume that 
the domain of ^ is a separable Hilbert space H with inner product (•,•). The 
underlying function space J- satisfies J- <Z H. We denote with A* the adjoint 
of A. We assume that A* A'lsa. compact operator on H with eigenvalues (6^), 
hj > 0, J G {0, 1, . . .j"^, with orthonormal system of eigenfunctions (pj. We 
assume that there exists positive constants q and Ci,C2 such that for all 

(45) C^-\3\-'<h,<C2-\j\-''. 

Let g,g' in or in V^, respectively. Write 

00 

9-9'= E i(^3-G'j)<t>r 

jl = lvJd = l 

1. The functions Q(t)j are orthogonal and ||(5(;^>j||2 = . Indeed, Q = 
{A-'^Y, and thus 

(Q</.„Q<Ai) = {<^j,A-\A-^Y(t^i) = bf{4>jAi), 

where we used the fact 

A-\A''rcPi = A'\A*)-'cl>i = {A*A)-'^i = bj^i. 

^ Note that when a bounded Une ar operator A between Bana ch spaces has a bounded 
inverse, then {A~^)* = {A*)~^ , see iDunford fc Schwart j ij 19581 '). Section VI, Lemma 7, 
page 479. 
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Thus for g,g' G J^^, 

\\Q{9-9')\\1 



9, - d]fhf 



(46) 



M 

E 

ji=0,...jd=0 

M 

= E 

ji=0,...,ja=0 

M 

ji=0,...,jd=0 



where we used (jl5]) to mfer that when j S {0, . . . , M}'^, then 
bj^ < Cf^ • |jf « < • (dM)29. 

On the other hand, \\g — g'\\2 = J2fi=o,...,j^=oi(^j ~ ^'jf' ■ This gives the 
upper bound for the operator norm 

(47) QiQ^J's) < CM'i < C'6-''/\ 



by the definition of M in (j42|) . 
2. The functions A(j)j are orthogonal and ||^(/)j||2 = bj. Indeed, 

{A(^j,A(t>i) = {<Pj,A*A(t>i) = bf{(t>j,(t>i)- 



Thus for g,g' £Vs, 



\\Aig-g')\\l 



M 

E 

j:^=M*,...,ja=M* 

M 

E - (^^f^- 

ji=A/*,...,id=M* 



This and similar calculations as in (j46p imply that 
(48) Cf^^/" < qk{A,Vs) < Cb'il'. 

6.1.2. Wavelet-vaguelette decomposition. We assume that the underly- 
ing function space J- consists of d-variate functions which are linear com- 
binations of orthonormal wavelet functions {(pjk), where j £ {0, 1, . . .} and 
A: G {0, . . . , 2-' — 1}*^. The /2-body and the corresponding class of functions 
can now be defined as 

j k I [ j k 
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where s > 0. We have already constructed a 5-net and (5-packing set for the 
/2-bodies in dH]), but in the current setting for 9 £ Qs 



where 
(49) 

and for 9 e Q} 



0, when j > J + 1, 



when j < J* or j > J + 1, 



where 9* is a fixed sequence with J2'jLo J2k '^'j^jk^ = L* < L, and J* = J—1. 

Operator norms. We can apply the wavelet-vaguelette decomposition, as 
defined in bonohol (UmI), to calculate the operator norms g{Q,J^5) and 
QxiAjDs). We have available the following three sets of functions: {(j)jk)jk 
is an orthogonal wavelet basis and {ujk)jk and {vjk)jk are near-orthogonal 
sets: 



H "'jk'Uji 
jk 



[<^jk)\\l2, 



jk 



{"■jk)\\l2, 



where a ^ b means that there exists positive constants C, C such that 
Cb < a < C'b. The following quasi-singular relations hold: 



■^4'jk — l^jVjkj 



A*u 



'jk 



^^j't'jki 



where Kj are quasi-singular values. We assume that there exists positive 
constants q and Ci, C2 such that for all j G {0, 1, . . .} 



(50) Ci ■ 2-« < Kj <C2-T 

1. Let g,g' G J^g- Write 



9-9 = Y^ Y^{Gjk - o'jk)4^jk- 

j=0 k 



Since Q = (A-^)* , then QA* = (AA-^)* = I. Thus, 

Pjk, Q4>j'k') 



Hj ^K,j,^{QA*Ujk,QA*Uj,k') 
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Thus, 



\\Q{9-9')\\1 



i=0 k 



j=0 k 



(51) 



j=0 k 

< C22'^^EE(^.^-^;- 

i=0 k 



/ x2 



where we used ()50p to infer that when j G {0, . . . , J}, then 

On the other hand, \\g - g'\\l = J2j=oJ2ki^jk - Oj^)^. This gives the 
upper bound for the operator norm 



by the definition of J in (j49l) . 
2. We have {Acpj^, Acpj/y) = Hji^j' {vjk, Vj'k') and (vjk) is a near-orthogonal 
set. Thus, similarly as in (fSTI) . we get 



6.1.3. Rates of convergence. We derive the rates of convergence for the 
5-net estimator when the operator is a convolution operator and the Radon 
transform. It is also shown that the lower bounds have the same order as 
the upper bounds. We give examples in the setting of the Gaussian white 
noise model. 

Convolution. Let A be a convolution operator: Af = a* f where a : IV^ 
R is a known function. Denote 



Y[V2[{1- h) cos{2tt jiXi) + ki sin(27rjj 



XG [0, l]^ 



i=l 



where j G {0, 1, . . .}'^, k Kj, where 

Kj = {A: G {0, 1}"^ : fci = 0, when ji = o} . 
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The cardinality of Kj is 2"'-"(^), where a(j) = #{ji : ji = 0}. The col- 
lection {(j)jk), G {0,1,...}'^ X Kj, is a basis for 1-periodic functions 
on L2{[0, 1]"^). When the convolution kernel a is an 1-periodic function in 
-L2([0, l]*^), then we can write 

oo 

ji=0,...,jd=OkeKj 

The functions 4>jk are the singular functions of the operator A and the values 
bjk are the corresponding singular values. We assume that the underlying 
function space is equal to 



(52) 

where 
(53) 



J" 



G 



Ji=0,...,ja=Ok(^Kj 



ji=0,...Jd=0 kGKj 



We give the rate of convergence of the 6-net estimator and show that the esti- 
mator achieves the optimal rate of convergence. Optimal rates of convergence 
h as been pr e vious l y obtained for the conv o lution prob l em in various settings 



m 



Ermakovl(fl989l ). lDonoho k Lowl (flOQ^ ^. lKool (|l993l ). lKorostelev fc Tsvbakw 



(11993). 

Corollary 1. Let T he the function class as defined in (fJl)). We as- 
sume that the coefficients of the ellipsoid satisfy 

for some s > and Cq,Ci > 0. We assume that the convolution filter a is 
1-periodic function in L2{[0, 1]'^) and that the Fourier coefficients of filter a 
satisfy 

C2\j\-' < bjk < CsUr'? 
for some g > 0, C2, C3 > 0. Then, 



lim sup n'^^/i^'+^i+d) J_j 



< 00, 



where f is the 6-net estimator. Also, 

liminf n2^/(2.+2'?+'^) inf sup Ef \\g - f\\l > 0, 

where the infimum is taken over any estimators g. 
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Proof. For the upper bound we apply Theorem [TJ Let J-'g be the 6-net of 
as constructed in ()4ip . We have shown in (j47p that 

QiQ,J's)<C5"'', 

where a = q/s. We have stated in (j44p that the cardinahty of the 5-net 
satisfies 

iog(#^5) < cr^ 

where 6 = d/s. Thus we may apply (1160 to get the rate 

^ ^-l/{2{a+l)+6) ^ ^-s/(2s+2g+d)_ 

The upper bound is proved. For the lower bound we apply Theorem [2j 
Assumption p!7j) holds because Vs in (|1T|) is a 5-packing set. Assumption 
psp holds by the construction, see ()94p in Appendix El Assumptions (I19p 
and ()20p follow from (j44p and (08]). Thus the lower bound is proved. □ 



Radon transform. We consider the 2D Radon transform as defined in ()36p . 
The singular value decomposition of the Radon transform can be found in 
Deanj jwsi ). Let 



4>Mr, 0) = ^-^'\2 + + \fl^z\'^-^\r)e'^^-^^\ (r, Q) ^ D = [0, 1] x [0, 2vr), 

where Z\ denotes the Zernike polynomial of degree a and order h. Functions 
'/'jfc) j) ^ = 0, 1, . . ., (j, fc) 7^ (0, 0), constitute an orthonormal complex- valued 
basis for L2{D). The corresponding orthonormal functions in L2(Y,i/) are 

^PJk{n, (P) = 7r-i/2[/,+fe(n)e*(^'-'^)^ (n, G Y = [0, 1] x [0, 2^), 

where Um{cos9) = sin((m + 1)^)/ sin 6* are the Chebyshev polynomials of 
the second kind. We have 

A4>jk = bjktpjk, 

where the singular values are 

(54) bjk = 7r-\j + k + l)~'/^. 

We shall identify the complex bases with the equivalent real orthonormal 
bases by 

f>jk = \ 4>jk if j = k 

V2lm{(f>jk) if j < k. 
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We assume that the underlying function space is equal to 
(55) T = 



il=0,j2=0,(jl,i2)^(0,0) 



where 
(56) 



G 



E 

Ji=0j2=0,(ji,j2)7^{0,0) 

We give the rate of convergence of the (5-net estimator and show that the 
estimator achieves the optimal rate of con vergence. Optimal rates of con - 
vergence have been previously obtained in iJohnstorie &: SilvermanI (llQQOl). 
Korostelev fc Tsvbakovl (|l99lh . lDonoho fc Lowl (jl992l ). lKorostelev fc Tsvbakovl 



Corollary 2. Let T be the function class as defined in I155\) . We as- 
sume that the coefficients of the ellipsoid i56\) satisfy 

for some s > and Co, Ci > 0. Then, for d = 2, 



limsup sup Ef f-f 



< OO. 



where f is the 5-net estimator. Also, 

liminfn2^/(2s+2d-i) i^isuv Ef\\g - f\\l > 0, 

where the infimum is taken over any estimators g. 

Proof. For the upper bound we apply Theorem [TJ Let J^s be the (5-net of 
as constructed in ()4ip . We have shown in (j47p that 

e{Q,:Fs) < C5~\ 

where a = q/s and q = 1/2 (so that a = (d — l)/(2s)), since the singular 
values are given in (I54p . We have stated in ()44p that the cardinality of the 
(5-net satisfies 

iog(#^5) < C(5-^ 

where b = d/s. Thus we may apply (fTBI) to get the rate 

The upper bound is proved. For the lower bound we apply Theorem [2] sim- 
ilarly as in the proof of Corollary [TJ □ 
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6.2. Additive models. In this section we will show that our approach can 
be used to prove oracle results for additive models. In additive models the 
unknown function / : R'^ — > R is assumed to have an additive decomposition 
f{x) = + - • • + fd{xd) with unknown additive components /j : R ^ R, 

j = 1, . . . ,d. We compare this model with theoretical oracle models where 
only one component function fr is unknown, but the other functions fj 
(j 7^ r) are known. We will show below that the function / can be estimated 
with the same rate of convergence as in the oracle model that has the slowest 
rate of convergence. In particular, if the rate of convergence is the same in 
all oracle models then the rate in the additive model remains the same. This 



i s a w ell known fact for classical additive regression models, see e.g. I Stone 



^t efficiently avoids the curse of dimensionality in contrast to the full 



dimensional nonparametric model. Furthermore, it is practically important 
because it allows a flexible and nicely i nterpretable model for regr ession 
with high dimensional covariates, see e.g. iHastie Tibshiranil (|l99d l for a 



discussion of the additive and related rnodels. Thus , our result will generalize 
the oracle result for additive models of Stone ( 19851 ) to inverse problems. For 



a theoretical discussion we will first use a slightly more general framework. 
We will come back to additive models afterwards. 

6.2.1. Abstract setting. We assume that the function class is a subset 
of the direct sum of spaces J-i,. . . , J-'p. All spaces contain functions from / : 
R*^ R. At this stage, we do not assume that functions in J^j (j = 1, . . . ,p) 
depend only on the argument xj. An example of this more general set up are 
sums of smooth functions and indicator functions of convex sets or of sets 
with smooth boundary. We assume that a finite 5-net of is a subset 
of the direct sum T\^^ © • • • © .^p,<5, where Tj^s are finite subsets of Tj. We 
denote the number of elements of Tj^^ by exp(Aj). Furthermore, we write 
Pj = p{Q,J^j^s)- We make the following essential geometrical assumption: 

(57) l|/i + --- + /plll>cX:il/illi 

for a positive constant c > 0. For the (5-net minimizer / over the 5-net J^g 
we get the following result in the white noise model. (An additive model for 
density estimation would not make much sense.) 



Theorem 5. We make assumption (51). In the white noise model the 
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following bound holds for the 6-net minimizer f, for f £ J^, 



E{\\f-fg) <36^ + 32c~'n~^ 



A proof of Theorem [5] is given in Section 17.61 

6.2.2. Application to additive models. We now apply Theorem [5] for dis- 
cussing additive models f{x) = fi{xi) + • • • + fd{xd)- In -^2(R''^) we have 
ll/i + • • • + /d||2 = Z!j=i ll/jlli' if the functions fj are normed such that 
/ fj{xj)dxj = 0. Thus ([57|) holds trivially. Assumption (f57|) also holds in 
other L2-spaces with dominating measure differing from the Lebes gue mea- 
sure. A discussion of condition ([5 



Mammen et al, 



for these classes can be found e.g. 

(I1999I I. See also iBickel et al.l (|l993l ) . Such L2-spaces naturally arise in addi 
five regression models. For a white noise model they come up if one assumes 
an additive model for transformed covariables. We assume that for the mod- 
els J-'j one can find 5j-nets J^j,Sj such that choosing 6j = ipn,j with 



n 



"V'(Q,-^,,Vn..)l0g(#.F, 



gives a rate optimal 6-net minimizer in the model J-j. Now, J^s = ^i,Si © 
• • • © ^d,5a is a S-net of with 5 = J2'j=i Sj. From Theorem [5] we get that 
the 5-net minimizer / over the net J^s achieves the rate 0{ipn) with ipn = 
maxKj<ii'ipn,j- This is just the type of result we called oracle result at the 
beginning of this section. 

In general, the oracle result does not follow from Theorem [H The appli- 
cation of Theorem [1] leads to an assumption of the type 



n \rnax /92(Q,JC-.^^ .) x max log(#J^j 
whereas Theorem [5] only requires that 



n max 

l<j<d 



This can make a big difference. First of all the entropy numbers of the 
additive classes J-j may differ. Furthermore, the operator Q may act quite 
differently on the spaces J^j. 
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6.2.3. Ellipsoidal spaces and convolution. As an example we now assume 
that the underlying function space is = JT^ © • • • © JT^, where 

{oo 
3=0 

for basis functions : [0, 1] — R and the ellipsoids are defined by 



(58) Qs.M = l^fc. : E '^lAi ^ 4 1 , k = l,...,d, 

where we assume that there exists positive constants Ci , C2 such that for 
all J G {0,1,...} 

(59) C^-f'<akj<C2-f'^. 

Let ^ be a convolution operator: Af = a* f where a : R*^ — R is a known 
function. Then 

Af = Aih + --- + Aafd, 
where f{x) = /i(xi) H h fd{xd) and 

Akfkixk) = / fk{xk - VkWiVk) dyk, 

where 

afe(yik) = / «(y) n ^^2/; 

is the feth marginal function of a. We can decompose Q accordingly: 

Qg = Qi9i H 1- Qd9d- 

Operators Aj and Qj are restrictions of A and Q to .Fj . We apply the singular 
value decomposition for A^. Denote 

4>kjit) = V2cos{27rjt), te[0,l], 

where j = 1, 2, . . . and 4'o{t) = /[o,i](i). The collection {4>kj)-, J = 0, 1, . . ., is a 
basis for 1-periodic functions on -L2([0, 1]). When O). are 1-periodic functions 
in L2([0, 1]), then we can write 

00 

ak{xk) = ^hj(t>k3{xk)- 

3=0 
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The functions (j)kj are the singular functions of the operator and the 
values bkj are the corresponding singular values. We give the rate of con- 
vergence of the (5-net estimator and show that the estimator achieves the 
optimal rate of convergence. 

Corollary 3. Let J- = J-i® ■ ■ ■ ® J-d- We assume that the coefficients 
of the ellipsoid satisfy ([5g|). We assume that are 1 -periodic functions in 
L2([0, 1]) and that the Fourier coefficients of satisfy 

for some qt > 0, C2,C3 > 0. Then, in the white noise model, 



limsupn" sup -Ej f — f 

where f is the 5-net estimator and 

mm 



< oo, 

2 



k=l,...,d 2Sk + 2(7fc + 1 

Also, 

liminf n"inf supi?/- 11^ - /||^ > 0, 
where the infimum is taken over any estimators g in the white noise model. 

Proof. For the upper bound we apply Theorem [5l As in Section 16.1.11 we 
can find 5-nets J^k,& for Tk whose cardinality is bounded by log{f^J^k,s) < 
CJ-^/"* and Q{Qk,J'k,s) < CS-'i'^/'K The upper bound of Theorem [5] gives 
as the rate the maximum of the component rates n^'^^k / {'ish+2qk+i) _ the 
lower bound we apply the lower bound of Corollary [1] in the case d = 1 and 
the fact that one cannot do better in the additive model than in the model 
that has only one component. □ 

7. Proofs. 

7.1. A preliminary lemma. We prove that the theoretical error of a min- 
imization estimator may be bounded by the optimal theoretical error and 
an additional stochastic term. 

Lemma 1. Let C C L2(R'^). Let f e C be such that 

(60) 7n(/) < inf7n(5)+e, 
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where e > 0. Then for each C, 



f-f 



< 



f-f 2 + e + 2M„[Q(/-/0)] 



where f is the true density or the true signal function, and Vnig) is the 
centered empirical operator: 



(61) Mg) = I 

where g : H'^ —>■ R. 



JgdYn-f^giAf), 



n 



white noise model, 
Ya=i gO^i) - Iy gi^f)^ density estimation, 



Proof. We have for g = f, g = f^, 

\\g - fWl - in{g) 

\l-2j^,fg + 2jiQg)dYn, 



white noise model 

12 — 2 Jj^d fg + 2n~^ J27=iiQg)0^)^ density estimation. 
We have J^, fg = J^{Af){Qg). Thus, 

2 
2 



(62) 
Thus, 



2u, 



n 



Q {f-f 



f-f 



f-f 







/- 


-f 


2-7n(/)+7n(/)- 


f- 


/ 


2 
2 


(63) 


< 


/- 


-f 


2-7n(/)+7n(/°)+e- 


f 


-/ 


(64) 


= 2i/„ 


Q (/ - f) 


+ e. 








In jMl) 


we applied 




, and in ([64 


) we applied 


dM!). 







□ 

7.2. Proof of TheoremU^ Let / e be the true density. Let cj)^ G J^s- 
Denote 

C = CiWf - fwl + C2n-i£.2(g,.F5) logM^s), 

where Ci is defined in (jlip and C2 is defined in (jl2p . We have that 



E\\f-f\\l 



P{\\f-ft2>t) dt 



< c + 



p 



(ll/-/lli>i) 



(65)= C + C2n-'g\Q,J's)l^" P{\\f-f\\l>C2n-'0\Q,Ts)t + C) dt. 
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Denote 

T„ = Crn~WQ,T5) (log,(#^5) + t) , 
where Cr is defined in (jlSh . Then, 

P{\\f-fg>C2n-'0\Q,J^s)t + c) 

= P (11/ - fWl > Cill/ - fWl + c^c;^^) 
= p{{i-20-'\\f-fg 

> 2^(1 - 20-'\\f - fg + c^ill/ - /Hi + c^2C-'^> 

(66) = P(||/-/i>2C||/-/||i + (l + 2e)||/-/i + er, 
We have by Lemma [H 



/-/ < -Z-^-/ +2z.„[Q(/-/)]. 



Denote 



^(0) = ll0-/ll2 + ll</'°-/lli + W2. 

Then we may continue ()66p with 

= ^ (^n[Q(/ - '/'°)] > m - /Hi + cil/ - /Hi + er„/2 

= pU[Q{f-cP'')]>wif)C 



< i-^ I max — > ^ 



<peJ^s,'Pf^<P° w{(t)) 



def 



(67) — Pmax- 

We prove that 

(68) Pmax < exp(-t), 

and this proves the theorem, when we combine (|65p and (j67p . 



Proof of (Eg). Denote 



1 «^('/') ■ 
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We have that 
(69) 

Also, 
and thus 



w 6) > 



Pmax <J2P (^^(9) > ■ 

g&5 



+ Tn]> 



f^r,\ '^'^f II l|2 ^ 1 

(7U) vq = max 2 — — max 

see r„ </,e^6,0^</-o 



-1/2 



\\Q{<P-<Pom _ sHQ,^s) 

\\<P-M\2 



Gaussian white noise. When W ~ N{0,a^), t hen we have P(W > (,) < 
2-iexp{-^V(2o-^)} for ^ > 0, see for example [dMIs^ (|199^), 

Proposition 

2.2.1. We have that Un{g) ~ N{0,n-'^\\g\\l). Thus, 



p (.„(.) > < 2- exp I - 1^ f < 2- exp I - . 



Thus, denoting Q =^ ^^a/2, 



-Pmaa; < #-^5 " exp 



#.F5-exp{-Q[log,(#.F5)+i]} 



< exp(-t), 
since > 1 by the choice of ^. 



Density estimation. Denote v = sup^gg Varj(<^(yi)), and b = sup^gg lbl|oo- 
We have that 



(71) 

by dZO]). Also, 



o 



w 6) > 



and thus, because g{Q,J^s) ^ 1 
(72) 
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Applying Bernstein's inequality, applying (|7T]) and ([72 



< exp ■ 

Continuing from ()69p . 



-ne ] 



2{v + j 



where 



= #^5-exp{-Q[log,(#^5)+i]} 
< exp(— t) 



and > 1 by the choice of We have proved ([68]) and thus the theorem. 



□ 



7.3. Proof of TheoremJM T o prove Theorem [2] we follow the approach of 



Hasminskii &: Ibragimov We start with a useful lemma. 



Lemma 2. Lei T) <Z T he a finite set for which 

(73) min{||/-5||2 :/,<?£ 2?, / / 5} > 5 

where 5 > 0. Assume that for some fo G P, and for all f G V \ {fo}, 

where < a < 1, t > 0, and in the density estimation model is the 

product measure corresponding to density Af , and in the Gaussian white 
noise model P^j is the measure of process Yn in Then, 

■f p \\f #l|2 ^ "^^i , r{Ns - 1) 
mfsup^A/||/-/||2 > -y - T—ZTTt — TT' 

where Ns = i^T> > 2 and the infimum is taken over all estimators ( either in 
the density estimation model or in the Gaussian white noise model). 
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Proof. Let fn '■ ^ R be an estimator of /. Define a random variable 9 
taking values in D: 

e = argminjgi,||/„ - f^. 

Note that by ([73|) . 

since 9 ^ f for an / E P implies that /„ is closer to some other g ^T) than 
to /. Then, applying also Markov's inequality, 

supi?^/||/„-/||i > rnsc^EAfWfn- fWl 



-lfi$PA]{\\fn-f\\l>5V^ 



The lemma follows by an application of Lemma [3] below. □ 

Lemma 3. (Tsvhakoi \ 199^) , Theorem 6.) Let 9 be a random variable 
taking values on a finite set P of probability measures. Denote #P = and 
assume N > 2. Let r > and < a < 1. Let for some Pq & ¥ and for all 

Pe¥\{Po}, 

(75) 

Then 

max P(9 / P) > (1 - a) ~ ^'^ 



PeP ' ' ~ ^ ' 1 + t(7V - 1) 
Proof of Theorem d For /, fo&V^^,f^fo, 



(77) 







Af 




< 


(log 




f (log 




1 (log 



< T 



(76) < (logr-)"'z)i(Pi}\pi}i) 



r ^) nD]^{Af,Afo), density estimation 
r^^) ^ II A/ — A/0II2, Gaussian white noise. 



where in ([76|) we applied Markov's inequality, and in ([77|) we applied for the 
Gaussian white noise model the fact that under P^j , 

— ^=exp{nVVZ + naV2}, 
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where Z ~ A^(0, 1) and a = \\Af — A/o||2- When we choose 
r = r„ = ex.p ^-a'^n[CigK{A,V^Jijjnf^ , 
for < a < 1, then applying assumption p^ . 

(n) I '^^Afo ^ , \ . f.^-l 



(78) = a. 

Applying Lemma [21 assumption (fTT]) . and ([78]) we get the lower bound 



(79) inf sup ||/-/||^>^^^(l-a) 

where N^^ = #P^„. Let n be so large that logg A''^„ > C^nQ'j^{A,'D^Jtpl, 
where C2 > Ci. This is possible by ([T9|). Then, 



rnN^„ = exp{log^N^^-a ^n[CiQK{A,V^J'ipn?} 
> exp {nel{A,V^J^l^l[C| - a-'Cf]} ^ 00 

as n ^ 00, where we apply ([20p and choose a so that C| - a-^Ci^ > 0, that 
is, {Ci/C2)^ <a<l. Then 

hm r4N^^-^) , 

1 + miN^^ - 1) 



and the theorem follows from (I79p . □ 
7.4. Proof of Theoreml^ Denote 

C = Cie + C2^2^ 

where Ci = (1 - 2^)-^, C2 = 1 - 2^, < ^ < (3 - \/5)/4. We have that 
EWf-fWl = P{\\f-ff2>t) dt 

< c + ^"p(||/-/||^>t) dt 

(80) = C + C2V' (11/ - /Hi > C2^Plt + C) dt. 
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Denote 
Then, 



(81) 



P[\\f-ff2>C2^lt + C 

= P{\\f-f\\l>C2C;\n + Cie) 
= p((l-20-i/-/||^ 

> 2^(1 - 2^-^/ - fWl + CaC-'^n + Cie 
= p(||/-/||2>2e||/-/i + er„ + e 



We have by Lemma [H choosing = /, 



/-/ ^<2un[Q{f-f)]+e. 



Denote 



P 



wig) = \\9-f\\l + Tn/2. 

Then we may continue (j8ip with 

ii/-/i>c2^2^+c) 
< p [i^Mf - /)] > m - /111 + ern/2 
= pU[Q{f-f)]>w{m 



(82) 

We prove that 
(83) 



< P sup ^ > t, 

\ge:F w{g) j 

de_f 



Psup < exp(-t • logg 2) 



and this proves the theorem, when we combine (|80p and (j82p . 



Proof of (E^. We use the peehng device, see for example Ivan de Geer 
tood ). page 69. Denote 



ao = Tn/2, 



2^3 



j = 0,l,, 
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Let Qj be the set of functions 

Gj = {g e : Qj < w{g) < bj} , j = 0,1,... 

and 

^j = {9^:^-\\9-f\\l<bj}, j = 0,l,... 

We have that 

oo 

= {g G : wig) > ao} = [jOj. 

j=0 

Thus, 

Psup < 2^ P sup > ^ 

< f;p( sup MQ{g-f)]>M 

(84) < f;p( supM„[Q(5-/)] >e«i) • 



i=o 



By Assumption 4 of Theorem [21 G('0n) = 24\/2G('(/^„), where G is defined 
in (j95p . for sufficiently large n. Thus, by the choice of C = ^~^4 • 24\/2 in 

By the choice of we have that Cr > 2, and thus = Crip'^{l + t)/2 > tp^. 
Since G{5)/5'^ is decreasing, by Assumption 2 of Theorem [3l then G{6)/6^ 
is decreasing, and 

eni/2/4 > Gi^Pn)/i^l > G{al'^)/a, > G{bf)/b^, 

that is, 

(85) iaj = ibj/A > n-'^''^G{b]'^). 

We may apply Lemma U] given in Appendix IB.H with (|85p to get 

P sup Vn[Q{g- f)] > iaA 



(86) < exp 



< exp {-C7"22j('^+i)nV'^(^+'^)(l + t)i+° } 
(87) < exp{-C"(i + l)nV'^(i+")(l + t)i+''}, 

imsart-aos ver. 2007/12/10 file: invemp-tech.tex date: April 20, 2009 



34 

where C" = C'c~^^'^2'^^''-'^\Cr/2)^+'', and we used the facts a]/b]-'' = 
22{a-i)^i+a ^ 22('^-i)(22Jao)^+" = 2^^''-^') [2^^ Cri^l{l+t) /2]^+'' and22i('^+i) > 
j + 1. When < 6 < 1/2, then J2T=o = ET=i ^ = V(l -b) < 2b. When 
nil^n^^^"^ > (loge2)/C", then exp{-C"ml^n^^^''\l + 1)^+"} < 1/2, and we 
combine (j8l]) and ([87|) to get the upper bound 



2exp{-C"nV'2(i+a)(^^^)i+a| < 2exp{-C7"nV^(i+'^)(l + t)} 

< exp{-tlog,2}. 

We have proved ()83p and thus we have proved Theorem [3] up to proving 
Lemma m which is done in Appendix IB. 1[ 

7.5. Proof of Theorem^ The proof goes similarly as the Proof of The- 
orem [3] until step (I86p . At this step we apply Lemma [Sj given in Appendix 
lR2l to get 



P sup lynlQig- f)] > iaj 

< exp - \ +2#gB,exp-' 



, c26]^" j ^1 12 i?^c26]-" + 2eaji?^„/9j ■ 

The first term in the right hand side is handled similarly as in the Proof 
of Theorem [3l For the second term in the right hand side we have, for 
sufficiently large n, 

— exp < 




12 5ooc26 + 2ia,B'J^ J " [ 12 B^c^a'^'' + 2^5^/9 



1 n^^Q-ag 



< exp<' -— 



12 Sooc2 + 2^5^/9] 
= exp {-nV^2{i+a)22,(;L + , 

since a~'^ = (22-?ao)~'^ < Oq" and > 1 for sufficiently large n, and 
we denote C = i'^Cl+"- /[2^+''l2{BooC^ + 2iB'^/^)]. The proof is finished 
similarly as the proof of Theorem [3l 

7.6. Proof of Theorem O We proceed similarly as in the proof of The- 
orem [TJ Choose fi G !Fs such that ||/ — /5II2 < 5, where / is the under- 
lying function in J^. Choose ^ < 1/2 and put C = Ci + C2 with Qi = 
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(l_2e)-i(l + 2e)||/-Mli, C2 = ^n-iE^=iP'A, and n = ^c-^i-\l-2i)-K 
We have that 

E{\\f-f\\t) < C + P{\\f-f\\l>t) dt 

(88) < C + /^"^(ll/-/ll2>i + C) dt. 

For the integrand of the second term we have that 

p{\\f-f\\l>t + Q) 

= p ((1 - 2e)-^ii/ - /Hi > 2^(1 - 2i)-^\\f - fwi + 1 + c) 
= p (11/ - nl > mf - fwl + (1 - + (1 - 2oc) . 

We now use Lemma [TJ This gives 

\\f-fg<\\f-fs\\l + 2,^n{Q{f-f5))- 

Together with the last equahties this gives 

P{\\f-ff2>t + C) 

< P{\\f- fsWl + 2z^n {Qif - fs)) > 2C\\f - fg + (1 - 20it + O) 

= P{un{Qif-f5)) 

> m -f\\l+ m - fswi + 2-^(1 - 20{t + C2)) 

< P [u^ [Qif - fs)) > 2-1^11/ - fsWl + 2-1(1 - 2C)it + C2)) . 

Now, put Wj = Pj/J2f=iPi and decompose = H h f5,p and / = 

/i + • • • + /p with fsj, fj £ J^j^s- Using assumption ([57|) we get with f3j = 
2-1(1 - 2^){wjt + nn-^pjXj), 

p{\\f-f\\l>t + c) 

< P I E (Qif, - fs,j)) > 2-1^0 X: ll/i - fsjl + E /3. I 

\i=i i=i i=i / 

< E P (i^n {Qif J - fs,,)) > 2-^ic\\fj - fsM + I3j) 

^ E E ^(^^n(Q(5i-A,))>2-^Cc||5,-A,||i + /3,). 
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We now use 



ne 



2 ; ' 

2 / 



compare to the proof of Theorem [TJ This gives 



^(ll/-/ll2>i + C 



< E E 2-^exp 

i=i gj^Tj^f, 

< t 2-^exp 

p 

< exp(Aj)2~^ exp 
p 



n 



2\\Q{g,-f5,iW 
' 2\\Qig, - fsM\ 



J2 exp -<c4-\l - 2i)wjp-'^t 



By plugging this into ([88]) we get 



^(11/ -/II 



n^cA-^{l-2i)wjp]H 



dt 



< C + I exp 

< C + En-i4[ec(l-2eH]"V,' 

= c + ri-^4[ec(i-20]"^ 1^2 pj 



Choosing = 4 ^ gives the statement of Theorem [5j 
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APPENDIX A: ELLIPSOIDS 



The ellipsoid has been defined in (j39p and we assume that the Oj satisfy 
Ol). We make the calculations now in the one dimensional case. 



A.l. 5-net. We shall construct a (5-net ^ for the ellipsoid in (1391) . 
The c onstruction is similar to the construction of lKolmogorov &: Tikhomirov 
(|l96lh . Let 

(89) M =[{C^^2^''^L5-^f'']. 

Let Qs{M) be a (5/2-net of 

{M 
i^j)j&{i,...,M} ■ - 
3=1 

We can choose Qs{M) in such a way that its cardinality satisfies 

#e5(M) < c ^°i^™^(-^A^) ^ 

volume(i?^^^) 

where i?^^^ is a ball of radius 5 in the M-dimensional Euclidean space. 
Define the (5-net by 

Qs = {(ej)je{i,...,oo} : (%)je{i,...,M} G 65 (M), Oj = 0, for j > M + l} . 



(5-net property.) We proof that Qs is a 5-net of the ellipsoid 0. For each 
6* E G there is Os £ Qs such that \\6 - BsWi^ < S. Indeed, let 9 e @. Let 
9s G 05 be such that 

M 
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Then 

M oo 

\\e-es\\l = Y.i(^,-es,^f+ E ^i<'^' 

j=l j=M+l 

where we used the fact 

oo oo 

(90) ^ ■ H ^ Cf ^M-^U^ < 6^/2, 

j=M+l j=M+l 

because, when j ^ {1, . . . , M}, then 



( Cardinality.) We prove that 

iog(#e5) < crv^ 

We have that 

volume(i?(^^)) = Cm ■ L'' J] 

and 

volume(Bf^)) = Cm ■ , 

where Cm is the volume of the unit ball in the M dimensional Euclidean 
space. Thus the cardinality of 0^ satisfies 



M 



#es = #es{M) < c — -j^^ ■ 

We have that 

M M 

Y[aj'<Cl[j-^ = C{M\)-^. 

j=i i=i 

Applying Feller ( 19681 ). pp. 50-53, we get 

M! > M*^+i/2e-A^. 

Thus 

iog(#e5) 

< Mlog(L)-slog(M!) + Mlog((^~^) + C 

< Mlog(L) - s{M + 1/2) logM + sM + Mlog((5"^) + C 

< M(log(L) + s) - sM log M + M log{6~^) + C 

< M{log{L) + s + C') + C 
(91) < 6-^/'C" + C, 

since M = C"'b-^l\ 
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A. 2. (5-packing set. For a fixed sequence e* with Ej°^o apf = L* <L 
let 65 (M) be a (5-packing set of 

E*M = I iOjhe{M*,...,M} ■■ E 4^' ^ - ^*)' 1 • 

Here, M* = [M/2]. We can choose 0J(M) in such a way that its cardinality 
satisfies 

(92) iog(#eKM)) > c*r^/^ 

Define 

©5 = {(^.•),G{o,...,oo} : {0j - 0*)j^{M*,...,M} e ©KM), 

(93) ej = e*, for j ^ {Ar,...,M}}. 

The bound ()92p follows similarly as the upper bound ()9ip . In the white noise 
case one can use this construction with 9* = and L* = 0. In the density 
case another choice of 6* may be appropriate to ensure that the functions 
in Ds are bounded from above and from below. This would allow to use the 
bound (I22[) to carry over bounds on Hilbert norms to corresponding bounds 
on Kullback-Leibler distances. Note also that a similar calculation as in ()90p 
shows that for 6,9' £ Q*^, 

M 00 

(94) \\9 - 9'\\l = E (^^ - ^i)' = E - < CS\ 

i=M' i=M* 

APPENDIX B: LEMMAS RELATED TO EMPIRICAL PROCESS 

THEORY 

B. l. Gaussian white noise. LemmaHlgives an exponential tail bound 
for the Gaussian white noise model. 



Lemma 4. Let he the centered empirical operator of a Gaussian white 
noise process. Operator is defined in |g7]) . Let Q C L2(R'^) he such that 
supggg II5II2 < R and denote with Qs a 5-net of Q, 5 > 0. Assume that 
6 I— > g{Q, ^5)\/logg(#^5) is decreasing on (0, i?], where g{Q,Qs) is defined 
in and assume that the entropy integral G{R) defined in (24^ is finite. 
Assume that g{Q, Qs) = cS'"" , where < a < 1 and c > 0. Then for all 

(95) e > n-^l'^ G{R), G{R) = max | 2AV2G{R) ^cR^-" ^log, 2/C' | 
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where 

(96) C = 12-\C")-^ 
we have 



P supuniQg) > M < exp 



C" = (1 - a)-3/2r(3/2)(log, 2)-3/2, 



2a 



Proof. The proof uses the chaining technique. The chaining technique 
was developed by Kolmog orov. An ana logous lemma in the direct case is 
for example Lemma 3.2 in Ivan de Geeil (jioOO). The basic difference to the 
direct case is visible in eq. ([98|) . Let us denote = 2^^R, Nj. = #Gr^ and 
Hk = logg Nk, where A; = 0, 1, . . .. For each g ^ G, let kg be a member of Rk 
covering set of Q such that ||5 — /ig||2 ^ Rk- We may write every g ^ G with 
telescoping as 



fc=i 



where hg = and the convergence is in L2. Let r/^ > be such that 
Efcli % < 1- We will define r]k in (fTnO]l . Then 

(97) P (snpUniQg) >q<Y.P Unpun {Q{h'g - hl'^)) > e% ) ■ 



k=l 



965 



We have 



We have 



# {h^g - h^g-^ : 5 G a} < iVfciVfc-1 < 



max{||Q(/i^'-/i^-i)||^ i^eg} < Tfcmaxl 
(98) < 2,TkRk, 



h'' - h 
"■9 "-g 



k-l 



■■ 9 e 



5} 



where we denote = q{Q,Gri.), when g{Q,Gs) is defined in (p3]) . and we 
used the fact 

||/i^-/i^l2 < ||/i^-5ll2 + ||/i^'-5ll2 <2'^i? + 2-^+1/? = 3/?fc. 

When iV(0 g^) £ > 0, then P{W > < exp{-eV(2f72)}, see for 
example loudlevl ( 19991 ). Proposition 2.2.1. We have that UniQ{hg-h'^~^)) ~ 
N{0,n-^\Q{h^g - /i^-^)||i) and thus 

(99) p(^snpi.n{Q{h'g-h'g-'))>(Vk^<N'k2-'expi^-^-^^ 
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Now we choose 



rol/2trl/2 ^ 

(100) rjk = 3TkRk max , , c-^R'-HC'k)^/^2 , 



1/2^ 



n 

where C is defined in ([96|) . Then we may apply ([99|) to continue ([97|) with 
an upper bound 

(101) ifexpK-l^l < ^Eexpl-i^l 



(103) < exp 



In (fTOTD we applied ([Ton]) . which imphes 2i?fc < <^r?fc/(4 • when 
we apply the first term in the maximum. In (jl02p we applied also (jlOOp . 
which implies r/^/(4 • S^T^i?!) > C"A;/[c^i?^~^°] where we applied the second 
term in the maximum. In (fT03D we applied that for < 6 < 1/2, = 
V(l - ft) < 26. Here we need that exp {-<2(;7//[g2^2-2a]| < ^^^2, that 

is, > ci?^^" ( ' raC'^ ) "^^^icli is implied by We need to check that 

Efcli% < 1- Since 5 i-^ £'((5, ^5) Vloge(#^5) is decreasing, 

00 00 

(104) Yl TkRkHl'^ = 2 E 2-^-^RT^-UR^\ogMQ2-^R) < 2G(i?). 
fc=i fc=i 

We apply the assumption that Tk = T2-k^ = cR^°-2°'^ to get 
00 00 

fc=i fe=i 

(105) = ci?^-° lim ^3/2 f\l/22-(l-a)Kt 

K^oo Jo 

roo 

= cR^-^il-a)-^/^ u^/^2-''du 
Jo 

(106) = cR^-^C", 

where C" is defined in ([Ml)- We have from (fTOl) and (fT06|) that 

- 8V26G(^ 1 1_ 

2.^^^ ^1/2^ +6VCC <2 + 2-^' 

k=l ^ 

when ^ > 2^^^'^QG{R)n~^/'^ , which is guaranteed by (195^ . and C is chosen 
as in dMl)- The lemma follows from (HH), ([MI), and (fT03D . □ 
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B.2. Density estimation. Lemma [5] gives an exponential bound for 
the tail probability in the case of density estimation. 

Lemma 5. Let Yi,...,Yn G R'^ be i.i.d. with density Af, and let the 
centered empirical process Un be defined in [61]} . Assume that ||j4/||oo < 
Boo. Let Q C L2(R'^) be such that sup^gg \\g\\2 ^ Li. Denote with Qs 6- 
bracketing net of Q, 6 > 0. Denote Qg = {g^ : (g^^g^) € Gs} and = 
{g^ : {g^,g^) G Gs}- Assume that supg^gL^jgu \\Qg\\oo < B'^. Assume that 

S ^ Qden {Q,Qs) Vloge (#^5 ) is decreasing on (0,i?], where Qden{Q,Gs) is 
defined in i flP|) and assume that the entropy integral G{R) defined in 
is finite. Assume that QdeniQiGs) = c6~°', where < a < 1 and c > 0. Then 
for all 

(107) C>n"^^^G{R), 
where 

G{R) = B^/2(g2^9g.2-2a)l/2 

(108) X max {24V2G(i?), 4(log,(2))-i(l - a)-3/2r(3/2)ci?i-'^} , 
we have 

P ^supz/„(Q5) 

where Vn is the centered empirical process defined in Ii61\) . 



Proof. We use the chaining technique with truncation. The basic differ- 
ence to the direct case is visib l e in (11151) and (11221). The technique was 
used in the direct case bv iBase jim^h. 'O ssiandeil (|l987l ). iBirge &: MassartI 
(|l993), Proposition 3, Ivan de Ge er (2000), Theorem 8.13. Let us denote 



Rk 

denote 
g ^ G, let {h 



2~^R, Nk = #Gr^ and Hk = log^Nk, for k = 0,1,.... Let us 
QdeniQ,GRf,), where QdeniQ,Gs) is defined in ([29]). For each 
be the member of the bracketing net Gr,. , such that 



k,L 
9 



hg'^ < g < hg'^ . We may write every g ^ G with telescoping as 



g = g 



+ 



h 



k~l,L 



k=l 



0,L 
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where 



mill {o < /c < K - 1 : QA^ > Pk} , if Q^g > Pk for some < A; < - 1 
K, otherwise, 



where K > 1 is defined in (jl23p . 

^0 ""o "'a ' 



and 



(109) p; - " °° "-fc 



k = 



Then, 



+PUupi/„(Q/i°'^) >e/3j 



(110) =^ + + 



Term Pj. We have 

supX:z^n(Q(/ig''^-/i3'-''^)) = sup^/|i,...,,^|(A;)i.„(Q(/i^'^-/i^-i'^)) 

< SUp/{l,...,.,}(/c)l^n (Q(/i^'^ - /i^''^)) . 

Let us denote 
(111) 

f ol/2 r1/2 frl/2 ^ 

r/fc = (9^ + 96 • 2-"^)y'nR, max | ^^^^ , c-'R^-\C'k)y'2 \ , 

where C is defined by 
(112) 

C' = 4-2((^")-2(g2 ^ gg . 2-2«)-i, c"' = (1 - a)-3/2r(3/2)(log, 2)-3/2. 
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We have defined r/^ in (lllip so that r/^ > and J2T=i Vk 1^ ^, which is proved 
in (fTM]) . Then, 

(113) P/ < E P (^sup/|i,...,,^}(A;)^.„ - /i^^-i'^)) > %e/3j ■ 



We have 

(114) 

Also, 



max 



|E|Q(/i^'^-/i^-i'^)|':5Ga 



(115) 
because 



< i?no max 



< BoqT^ niax 



T A;,L _ T^fc— 1,L 
""9 



< 



+ 



Lfc,L _ uk-l,L 
"■g "'9 



When k < Kq, then 



Qih: 



k,L ik-l,L\ 



which implies 
(116) 



g '"g y (^g' ~ ^g 



< 2/3fc_i. 



Thus, applying ()114p . pisp . pi6p . by Bernstein's inequality. 



(117) 
(118) 
(119) 



P I sup/|i,...,«^}(fc)z.„ (^Q(/i^'^ - /i^-i'^)j > e%/3 



< A^fexp 



1 



n(e%/3)^ 



2 S^B^T^Rl + 2/3fc_ie%/9 



1 

< exp < 2Hk — - 



< 



< 



exp 



exp 



2 32(32 + 24 • 22(i-«)/9)PooT|Pfc 
1 n(gr?fc)2 ] 

4 (92 + 96 • 2-^-)B^TlRl j 



c'^B^R^-^^ j ■ 
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In (fTTTl) we applied the fact Pk^i^m < 12Boo2'^^^-''^T^Rl, which follows 
since Tf^Rk = cR]r'^ = 2^^"rfc_|_ii?fe+i, which implies 



(120) 



125oo 2^(1-'^) T^^ii? 



fe+i 



since < % < 1, where r/^ is defined in pil|) . In pi8|) we applied the first 
term in the maximum in (jllip which implies 2Hk < 4~^n(^r/fc)^/[(9^ + 96 • 
2~^")SooT'^-Rf.]. In (jll9[) we applied the second term in the maximum in 
(fml) . which implies r/^/[4 • (9^ + 96 • 2-^'')T^Rl] > C'k/[c^R^-^% We may 
continue (jll3p with an upper bound 



(121) 



oo 

^exp 

k=l 



JbZr^ 



< 2 exp 



c'^B^R^-^^ 



We applied the fact that for < a < 1/2, Efcli a'' = a/{l - a) < 2a. 
Here we need that exp {— n^^C"/[c^-Boo-R^^^"]} < 1/2, that is we need, ^ > 

1 /2 

cBW^R^-" which is implied by (fT07|) . 



Term P//. We have 
and thus 



(qa^«) + 2^|qa 



Here we used the assumption that operator Q preserves positivity (<? > 
implies Qg > 0). We have for A; = 0, ... , K, 



max < E 



QA^l'i^Ggj < Poomax|||QA^|[ : (7Gg 



(122) 



< BoqT^ max 

< BooT^rI- 



A 



When Kg = k, then QA^" > /J^, for = 0, . . . — 1. Thus, for Kg = k, 
k = Q,...,K -I, using (fT09]) . 

EIQA^^I < /3,-iE|QA^p < O^^B^TlRl < i/l2, 



and for Kg = K, 
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when we choose 

(123) 

Thus 



Define 



K = mm[k>l: UBH^TkRk < • 



P f sup2E|QA^«| > ^/6j = 0. 



gw = {g eg : Kg = k}, k = o,...,K, 

so that g = [jk=oS^''^- Then, 

Pi I < P (snpun (qA^A > < J] P f sup (qA^) > 



(124) 
where 



p(0) , p(l) 
-'^7/ ^11 ' 



p)? = p f sup (qao) > e/e) , pff = f; p f sup (qa^^) > ^/e) . 

We have 

(125) # {a^ : g G < # {a^' : 5 G a} < iVfe. 
It holds that 

(126) \QAl,-EQA^g\<AB'^. 

We have, using p22p . p25p . (|126p . by Bernstein's inequahty. 



(127) 



Pj? < AToexp. 



Let us turn to Pj]^- For Kg = k (that is, when g € ^^'^^), for /c = 1, . . . , K, 



n(e/6)2 



2 PooT2p2 + 2P^e/9 



which implies 
(128) 



gA^ < QA^-i < 



gA^-i^QA^ <2Pk-i. 
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Thus, using (fT22]) . (fT25]) . (fT28]l . the fact that < ?7fc < 1, where r/^ is defined 
in (jllip . by Bernstein's inequahty, for k = 1, . . . , K , 



P sup MQK) ^ < P\ sup z^n(QA^) > e%/6 

< A/fc exp 



(129) 
(130) 
(131) 



< exp I Hk 



2 B^TlRl + A_ie%/9 j 
1 n{^Tf]kf 

2 62(l + 22(2-a)/3)5ooT2i?2 



< exp 



4 (62 + 48 • 2-^'^)B^TlRl 



In (p9]l we apphed the fact Pk-iCVk < 12Soo22(i-'*)rf which fohows by 
using (jl20p . In (jl30p we apphed the first term in the maximum in (jllip which 
imphes Hk < 4-in(^ryfc)2/[(62+48-2-2'*)s^rf i?^], since 2-i(62+48-2-2«) < 
92 + 96 • 2~2". In (|13ip we apphed the second term in the maximum in (jllip . 
which imphes 7/|/[4 • (6^ + 48 • 2-2")T|/?|] > C'k/[c^R^-^'']. We get 



(132) 



Pff <Eexp. 



fc=i 



SooC2i?2-2a 



< 2 exp 



<2C7' 



BooC2i?2-2a 



In (fT32l) we applied that for < a < 1/2, Efcli a'' = a/(l - a) < 2a. 
Here we need that exp {—nS,'^C'/[Booc'^R'^^'^"']} < 1/2, that is we need, ^ > 

1 /2 

(^) B^cR^-" which is imphed by (fT07|) . 



Term Pni- We have first, 



#{/i°'^:5eg}<iVo, 



second 



sup-E g/^O'^ ^ < sup Q/i°'^ ^ < B^T^R^ = B^c^R^-^", 
seg 965 2 



and third 



sup 

see 
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Thus, by Bernstein's inequality 
(133) P,,,<iVoexp.^ ' 



2 BooTiR'o + i2B'^/9 



Finishing the proof. The lemma follows from (jllOp . (jl2ip . (jl27p . ()132p . and 
(|133p . after checking some final facts. We need to check that X^fcli % ^ 1- 
Applying the calculations in (jl04p and (jl06p we get 
(134) 

f < (9^ + 96 . ( S^^^^j'2G(i^) ^ < 1 + 1 = 1, 



fc=i 



when i>2- S^/'^QG{R)n-^/'^{o,'i ^ gg . 2-'^''Y/'^bU^ , which is guaranteed by 
(fTUTl) . and C" is chosen as in (fTT2]l . □ 

Remark 7. When in addition ^ satisfies 



(135) 2y'441og,(#g^)i3^/2ciii'"n-V2 < ^ < ij^c^i^^d-")/^/^ 
then 

exp <^ - — „ „^„on-.^ , o.r., } < exp 



12 SooC2i?2(l-a) + 2^5^/9 j - \ SooC2i?2(l-a) 

Indeed, we may continue (|127p by 

P;i' < iVoexp/ 1 "-^ 1 



(136) < exp <^ Ho 



2 B^T^Rl + 2B'^^/9j 

1 ne 

2 62(l + 2/9)Sooc2i?2(l-a) 
1 <2 ^ 



('^'^ - 4 44i?^c2i?2(l-a) I • 

In (|136p we applied the upper bound in p35p and the fact TqRq = cR^""". 
In (|137p we applied the lower bound in (jl35p which implies the fact Hq < 
4"^<2/[44^^g2^2(i-a)]^ Also, we may continue (fT33]) by 

( 1 n(C/3)2 1 



Piii < No exp < 

(138) < exp|i/| 

(139) < exp i - 



2 i?ooTo2i?2 + ^2B^/9 



1 



' 2 32(l + 2/9)SooC2i?2-2a 

1 n^^ 1 



4 115ooC2i?2-2a J " 
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In (jl38p where we applied the upper bound in (jl35p . In (I139j) we apphed 
the lower bound in (|135p which imphes Hq < 4~^n^^/(lli?ooC^-R^~^")- 

APPENDIX C: INTRODUCTORY REMARKS 

We add a short introduction to the setting of the article, in order to make 
the article more accessible to PhD students. 

A quite general inverse problem could be described as a problem where 
we want to recover a function / : R'^ R when we have only available some 
transform Af of the function. An important example is the sampling oper- 
ator Af = (/(xi), . . . , f{xn)) G R", where xi, . . . , a;„ G R"^ are fixed points. 
Classical methods for recovering / in this case include piecewise constant 
interpolation and various ways to linearly interpolate the observed function 
values. In statistics some kind of sampling operator is always involved and 
thus recovering / from noisy data f{xi) + e^, i = 1, . . . ,n, where are er- 
ror terms, would not be called an inverse problem in statistics. We mention 
three classical statistical inverse problems, where function / : R'^ — > R has 
to be estimated and ^ is a fixed operator mapping functions R*^ — > R to 
functions Y — > R, where Y is some general space. 

1. (Regression function estimation.) We observe data 

Yi = {Af){Xi) + tieYi, i = l,...,n, 

where G R are random errors and G Y are random design points. 

2. (Density estimation.) We observe identically distributed observations 

yi,...,y„G Y, 

whose common density is Af . 

3. ( Gaussian white noise model.) We observe a realization of the process 

dYniv) = {Af){y) dt + n~'/^dW{y), yeY, 

where W{y) is a Wiener process on Y. When Y = R, then we can 
define the process by 

Yn{y)= r {Af)it)dt + n-'/^Wiy), 

J — oo 

where W is the Brownian motion, or Wiener process, on the real line. 
The Gaussian white noise model is rather close to the regression func- 
tion estimation when the error terms ej are Gaussian and the design 
points Xi are uniformly distributed in the unit square. However, in the 
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Gaussian white noise model we have ehminated the problems related 
to interapolation since the function Af is observed continuously and 
not in a finite number of design points. Since the assumption of contin- 
uous observation is quite far from reality, we can use inference in the 
Gaussian white noise model only as a first approximation. In addition, 
the assumption of the exact Gaussian distribution is very restrictive. 
Due to the central limit theorem the Gaussian white noise model is a 
relevant approximation also for the model of density estimation and 
for the model of regression function estimation under non-Gaussian 
noise. 

Let us now consider the estimation of a regression function (item 1 of the 
above list). A common approach for regression function estimation is to find 
the estimator / as a solution of the minimization problem 

n 

(140) / = argmin3e^^(y, - {Ag){Xi))^, 

i=l 

where J- is some class of functions R'^ R. Note that estimator / is a 
special case of the linear regression estimator 

n 

f{x) = f3o + pf X, (,Po,Pi) = argmin^pgR^^^gRd ^{Yi - Po - Pi Xif, 

1=1 

when A is the identity operator and T = {Pq + Pfx : /3o G R, /3i S R*^}. 
Estimator /, defined in ()140p . can be defined also as 

/ 2 " 1 " \ 

(141) / = argmin^g^ " " E " + - Y.i^g)\X,) . 

\ " i=i i=i I 

The estimator which we have considered can be defined, assuming now for 
simplicity that the design points Xj have a known distribution on Y, 

(142) / = argmin^g^ ( " " E " (Q5)(^i) + / ff' ) , 

where Q = {A~^)* is the adjoint of the inverse, for the space L2{v). Note 
that when 5 : R'^ ^ R, then Qg -.Y ^K. 

When operator B : Hi — > H2 is defined as a mapping from a Hilbert 
space Hi to an another Hilbert space H2, then the adjoint B* is defined as 
the operator satisfying the equality 

{Bx,y)2 = {x,B*y)i, 
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where (•, •)« are the inner products of the Hibert spaces. In the case when the 
Hilbert spaces are the Euchdean space: Hi = H2 = R"^, then the operators 
are d x d real matrices, and we have {Bx,y) = {x,B'^y), where is the 
transpose of matrix B, and thus the adjoint is equal to the transpose. We 
have given further examples of adjoints in ()32p . where the adjoint of the 
inverse of a convolution operator is given, and in (j34p . where the adjoint of 
the inverse of the Radon transform is given. 

The estimator defined by ()140p and ()14ip seems quite natural but we can 
justify the estimator defined in (jl42p by the following calculation. We have, 
similarly as in ([5]), 

2 n I f\\2 



\\f-m-\\m = -2 / ff+wm 



2j^{Af){Qf)du + 
-f^y,-(Q/)(X,) + 



n . 



The last approximation in the above calculation uses the fact that the dis- 
tribution of the design points is v. 
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